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Design  Goals  for  Relational 
and  DBTG  Databases 

E.  Wong  and  R.  H.  Katz 
University  of  California,  Berkeley 


Abstr^t 

In  this  paper  the  goals  of  logical  design  for  rela¬ 
tional  and  DBTG  databases  are  stated  in  terms  of  the  opera¬ 
tional  requirements  of  database  maintenance  and  redesign.  A 
set  of  mapping  rules  for  attaining  these  goals  are  then 

V 

secified.  The  relational  schemas  that  result  from  an  appli- 

P 

cation  of  the  mapping  rules  are  shown  to  be  in  the  fourth 
normal  form.  ■ 

J.  •  Introduction 

The  semantic  sparseness  of  the  relational  data  model, 
while  responsible  for  much  of  its  power  in  retrieval  and 
data  manipulation,  works  iigainst  it  in  update  operations. 
This  defect  has  long  been  recognized  [C0DD71].  One  approach 
to  its  correction  is  to  aug.ment  the  semantics  of  the  rela¬ 
tional  model  with  functional  dependent^ies  and  multivalued 
^.®2®ndencies ,  and  through  these  structures  define  normal 
forms  for  relations. 

The  idea  of  normalization  in  database  design  was  pro¬ 
posed  by  Codd  [C0DD71],  and  subsequently  two  distinct  varia¬ 
tions  to  this  approach  have  emerged.  One,  called 

Research  supported  by  the  U.S.  Army  research  Office  Grant 
DAAi,  q-76-6-02^5  and  the  U.S.  Air  Force  Office  of  .Scientific 
Hese.arch  Gr^ant  78-3596. 
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decomposition,  is  a  refinement  of  Codd ' s  normalization 
[FAGI77b].  Here,  one  starts  with  a  collection  of  relations 
having  atomic  values  (i.e.,  in  first  normal  form)  and  a 
specified  list  of  multivalued  dependencies  among  the  attri¬ 
butes  of  these  relations.  The  design  proceeds  by  decomposing 
the  relations  in  the  initial  collection,  until  every  rela¬ 
tion  is  in  the  fourth  normal  form  (4NF).  It  is  important  to 
note  that  a  multivalued  dependency  is  defined  in  the  context 
of  a  specific  relation.  As  such,  it  is  a  property  of  the 
attributes  of  a  relation,  not  their  underlying  domains. 

An  alternative  to  decomposition  is  synthesis  [BER^76]. 
Here,  one  begins  with  a  set  of  functional  dependencies, 
somewhat  modified  in  the  definition  so  as  to  be  independent 
of  the  context  of  specific  relations.  The  design  then 
proceeds  by  synthesizing  a  collection  of  relations  in  the 
third  normal  form  (3NF). 

Fagin  [FAGI77b]  has  elaborated  on  the  difficulties  of 
synthesis  as  a  design  procedure.  Yet,  there  is  much  about  it 
that  we  like.  It  seems  to  us  unsatisfactory  that  the  princi¬ 
pal  semantic  objects  of  the  design  model  should  depend  on 
the  context  of  the  specific  relations,  which  after  all  are 
only  convenient  ways  of  grouping  data. 

The  difficulty  with  synthesis,  we  believe  is  due  to  the 
fact  that  even  augmented  with  functional  dependencies,  the 
relational  model  is  still  semantically  inadequate  for 
design.  Semantic  objects  that  should  be  distinguished  are 
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not,  with  the  result  that  simple  ideas  become  difficult  to 
explicate.  This  fact  was  clearly  recognized  in  [SCHM75]  and 
they  proposed  a  "basic  semantic  model"  in  order  to  distin¬ 
guish  between  "characteristics"  and  "associations"  among 
primitive  object  types.  They  showed  that  violations  of  3NF 
are  results  of  combining  in  the  same  relation  different 
semantic  objects  which  are  independent. 

Normal  forms  for  DBTG  data  models  do  not  appear  to  have 
been  considered,  but  clearly  such  models  are  also  not  immune 
from  update  anomalies. 

In  this  paper  our  primary  objective  is  to  state  some 
explicit  and  operational  design  goals  for  both  relational 
and  DBTG  data  models  and  to  propose  a  procedure  by  which 
such  goals  can  be  realized.  In  the  process  both  normal 
forms  and  the  relations 1-DBTG  schema  translation  problem  are 
illuminated.  While  we  believe  that  a  data  model  used  for 
design  must  have  sufficient  semantics  to  allow  the  goals  of 
design  to  be  clearly  stated,  we  al.so  believe  that  excessive 
semantics  is  a  burden,  and  every  semantic  object  introduced 
should  be  justified  by  a  specific  operational  purpose.  Our 
concern  is  with  the  operational  effects  of  semantics  and  not 
semantics  per  se. 

2.  A  Design  Model 

Our  point  of  departure  is  to  choose  a  basic  semantic 
data  model  which  is  to  be  used  to  specify  a  design  schema. 
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The  model  as  such  is  not  new,  being  essentially  a  simplifi¬ 
cation  of  both  the  entity-relationship  model  [CHEM7o'  and 
the  semantic  data  model  introduced  in  [SCflMTS].  BY  a  "data 
model"  we  shall  mean  a  collection  of  data  ohiecL  t.ype!' 
introduced  to  represent  data,  while  by  a  "schema"  we  shall 
mean  a  specific  choice  of  data  objects  to  represent  a  data¬ 
base.  Thus,  for  example,  the  relational  data  model  consists 
of:  rlation,  domain,  tuple,  attribute,  etc., while  a  rela¬ 
tional  schema  consists  of  a  specific  collection  of  rela¬ 
tions  . 

For  each  instance  of  time  t,  let  E^(t),  E^Ct).  ...  , 
E^(t)  be  n  distinct  sets,  which  we  shall  call  entity  sets. 
An  entity  set,  usually  referenced  by  name,  is  in  reality  a 
family  of  sets  which  change  as  members  are  inserted  and 
deleted . 


A  Pjop^rty  of  an  entity  set  E(t)  is  a  one-parameter 
family  of  fu^nctions  f  ^ ,  mapping  at  each  t  ECt)  into  some  set 
V  of  values.  Ob.serve  that  implicit  in  this  definition  are 
the  requirements  that:  (a)  at  each  t  f^  is  defined  on  all  of 
E(t),  and  (b)  for  every  e  in  E(t)  the  value  f^(e)  is  unique. 

As  an  example,  consider  the  following  entity  sets  and 
properties: 


entity  set 
emp 
dept 
job 


prop^t^ies 
ename, birthyr 
dname , location 
title, status, salary 
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A  relationship  among  entity  sets  E^(t),  ...  ,  E^^Ct) 
is  a  time-varying  relation,  i.e.,  at  each  t  is  a  subset 
of  the  cartesian  product  E^(t)  X  Ep(t)  X  ...  X  E^(t).  For 
example,  the  following  two  relationships  specify  respec¬ 
tively  the  employees  qualified  to  hold  each  job,  and  the 
jobs  allocated  to  each  department: 

qualified  (job,emp) 

allocation  (dept, job) 

A  relationship  may  optionally  have  a  property  defined  on  it. 
For  example,  ''number  allocated"  is  such  a  oroperty  for 
"allocation."  We  assume  that  the  relationships  specified  in 
a  design  schema  are  independent  and  indecomposable.  Indepen¬ 
dence  means  that  no  relationship  is  derivable  from  other 
relationships,  and  indecomposability  means  that  no  relation¬ 
ship  is  equal  to  the  join  of  two  of  its  projections  for  all 
time . 

We  shall  say  that  a  binary  relationship  R^  on  entity 
sets  E.|(t)  and  E^Ct)  is  in  E.|(t)  if  each 
entity  of  E^(t)  occurs  in  at  most  one  instance  of  R^.  If 
each  entity  in  E.|(t)  occurs  in  exactly  one  instance  of  R^, 
we  shall  call  R^  an  a^qciation.  At  each  t  an  association 
R^(E^(t),  EpCt))  is  a  function  which  maps  E.|(t)  into  Ep(t). 
For  example,  consider  the  binary  relationship  mgr (dept ,emp) . 
If  each  dept  is  required  to  have  one  and  only  one  manager  at 
all  times,  the  mgr ( dept ,emp)  is  an  association.  If  a  dept 
can  be  temporarily  without  a  manager,  then  mgr (dept ,emp)  is 
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single-valued  in  dept,  but  not  an  association.  As  we  shall 
see,  the  distinction  oetween  an  association  and  other  rela¬ 
tionships  is  i-noortant  for  both  relational  and  DBTG  sche-nas . 
The  distinction  between  single-valued  r ela t i onshi ps  and 
other  relationships  is  important  in  the  DBTG  data  model, 
where  the  set  construct  provides  a  natural  support  for  the 
single-value  property. 

We  shall  assume  that  neither  an  association  nor  a 
single-valued  relationship  has  a  property  defined  on  it.  A 
property  of  an  association  is  necessarily  a  property  of  the 
domain  entity  set  of  the  association  so  that  the  concept  is 
superflous.  To  provide  automatic  integrity  support  for  a 
single-valued  relationship,  we  shall  represent  such  a  rela¬ 
tionship  by  a  single  DBTG  set,  and  doing  so  precludes  the 
relationship  from  having  a  property. 

The  design  model  that  we  have  outlined  distinguishes 
among  the  following  semantic  objects: 

( a  )  entity  set 

(b)  properties  of  entity  sets 

(c)  associations 

(d)  single-valued  binary  relationships 

(e)  relationships 

(f)  properties  of  relationships 

An  example  of  a  design  schema  is  the  following: 


Example  2.1 

entity  set  properties 

emp  "  ename ,  bfr  t'hyr 

dept  dname , location 

job  title, salary 


r 
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^s^o  c  o  n  s 

works-inTemp ,dept) 
assignment ( emp .job) 


relationshi p 
mg'r  Tdept ,  emp ) 
qualified( emp , job ) 
allocation (dept, job) 


status 

single-valued  --- 

general  --- 

general  number 


Our  task  now  is  to  state  the  design  goals  and  to  specify  the 
mapping  rules  by  which  these  goals  can  be  realized. 


3-  Operational  G^oals  ^r  Design 

Our  goals  are  related  to  database  redesign  and  updates. 
We  define  a  database  redesign  operation  as  the  addition  or 
removal  of  one  of  the  following  semantic  object  types: 

(a)  entity  set 

(b)  property 

(c)  association 

(d)  relationship 

The  removal  of  an  entity  set  causes  the  automatic  removal  of 
all  its  properties  and  associations,  and  all  the  relation¬ 
ships  in  which  it  participates. 

Design  Goal  /M  A  redesign  shall  have  no  impact  on  programs 
that  reference  only  those  data  objects  which  survive  the 
redesign . 

For  example,  if  the  relationship  "qualified"  is  eliminated, 
a  program  to  find  all  the  employees  assigned  to  a  given  job 
would  be  unaffected. 
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We  define  an  update  operation  as  one  of  the  following: 

(a)  inserting  or  deleting  an  element  of  an  entity  set 

(b)  adding  or  removing  an  instance  of  a  relationship 

(c)  changing  the  value  of  a  property  or  an  association 
of  a  single  entity 

When  an  entity  e  is  deleted,  any  entity  having  e  as  its 
value  in  an  association  must  also  be  deleted  to  preserve  the 
integrity  of  the  association.  The  induced  deletions  are  con¬ 
sidered  additional  update  operations,  and  not  as  a  oart  of 
the  deletion  of  e.  Similarly,  the  removal  of  any  instances 
of  relationships  in  which  e  participates  is  also  considered 
to  be  additional  updates.  The  remaining  design  goals  per¬ 
tain  to  update  operations. 

Des^^  Goal  #2  An  update  of  type  (b)  cannot  spawn  additional 
update  operations. 

In  particular,  no  entity  can  be  caused  to  disappear  as 
the  result  of  the  removal  of  an  instance  of  a  relationship, 
and  the  insertion  of  an  entity  does  not  require  that  it  par¬ 
ticipate  in  any  relationship. 

Design  Gq^l  A  single  update  affects  a  single  tuple  in  the 
relational  case  and  a  single  record  occurrence  for  DBTG. 

The  change  in  the  value  of  one  function  (property  or 
association)  of  one  entity  should  reflect  in  the  change  of  a 
single  tuple  for  the  relational  data  model  and  in  a  single 
record  occurrence  for  the  DBTG  data  model. 
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*  The  Insertion/deletion  of  an  entity  should  correspond 

to  the  insertion/deletion  of  a  single  tuple  in  the  rela- 
tional  data  model  and  a  single  record  occurrence  in  the  DBTG 
data  model . 

The  addition/removal  of  a  single  instance  of  a  rela¬ 
tionship  should  correspond  to  the  insertion/deletion  of  a 
single  tuple  in  the  relational  data  model,  and  for  the  DBTG 
data  model  it  should  correspond  to  the  insertion/deletion  of 
not  more  than  one  record  occurrence. 

The  goal  of  minimality  is  not  merely  one  of  minimizing 
storage.  Indeed,  that  is  not  the  primary  issue.  Instead, 
the  main  issues  are:  (a)  ease  of  maintaining  consistency 
through  a  one-fact-one-place  rule,  and  (b)  an  economy  of 
expression  in  the  update  programs. 

Our  next  design  goal  is  achievable  only  for  the  DBTG 
data  model,  and  it  pertains  to  automatic  propagation  of 
entity  deletions.  When  an  entity  e  is  deleted,  any  entity 
having  e  as  its  value  in  an  association  must  also  be 
deleted.  For  example,  in  the  design  schema  of  example  2.1, 
if  a  job  with  title  "accountant"  is  deleted,  then  all 
employees  having  job  assignment  as  accountants  are  also 
deleted.  This  is  a  necessary  consequence  of  "assignment" 
being  an  association.  Similarly,  when  e  is  deleted  any 
instances  of  relationships  in  which  e  participates  must  be 
removed.  The  structures  of  a  DRTG  schema  can  be  used  to  sup- 
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port  automatic  propagation  of  these  induced  deletions. 

Design  Goal  For  a  DBTG  schema  the  command  to  delete  one 
entity  suffices  to  effect  all  induced  updates. 

For  the  DBTG  data  model  the  deletion  of  an  entity 
should  cause  all  induced  deletions  and  removals  to  be 
effected  without  explicit  instructions  to  do  so  in  the 
applications  program. 

Our  final  design  goal  is  also  restricted  to  the  DBTG 
data  model . 

Design  Goal  //5  The  integrity  of  being  single-valued  of  any 
relationship  is  to  be  automatically  preserved  on  updates  for 
a  DBTG  schema. 

Here,  we  want  to  take  maximum  advantage  of  the  one-to- 
many  nature  of  DBTG  sets  to  obviate  explicit  integrity  sup¬ 
port  for  single-valued  relationships. 

Mapping  Rule  for  a  Relational  .Schema 

We  define  an  idenMJ^ier  as  a  one-to-one  property  of  an 
entity  set,  designated  to  represent  the  entities.  As  such, 
the  value  of  the  identifier  for  a  given  entity  cannot  be 
changed  [  HALL76  ]  .  A  Pfli^ary  furiction  is  a  property  or  an 
association  specified  in  the  design  schema.  A  primitive 
object  is  either  a  relationship,  or  an  entity  set  in  its 
role  as  the  domain  of  a  primary  function.  We  propose  the 
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following  rules  for  nfiapping  a  design  schema  into  a  rela¬ 
tional  schema. 

(4.1)  Each  entity  set  has  an  explicit  identifier  which 
represents  it  globally  in  the  relational  model. 

(4.2)  The  identi f ier ( s )  of  a  primitive  object  together  with 
all  the  primary  functions  of  the  primitive  object  are 
grouped  in  the  same  relation  of  the  relational  schema. 

(4.3)  There  is  one  and  only  one  primitive  object  per  rela¬ 
tion  of  the  relational  schema. 

Comments : 

(1)  it  is  clear  that  an  application  of  (4.1)-(4.3)  yields  a 
relational  schema  consisting  of  one  relation  for  each  entity 
set,  and  one  relation  for  each  relationship.  The  domains  of 
a  relation  representing  an  entity  set  consist  of:  its  iden¬ 
tifier,  the  identifiers  of  its  associations,  and  its  proper¬ 
ties.  a  relation  representing  a  relationship  has  as  its 
domains;  the  identifiers  of  the  participating  entity  sets, 
and  any  properties  of  the  relationship. 

(2)  Often,  an  entity  can  be  identified  by  a  combination  of 

associations  and/or  properties.  For  example,  (dname.mgr)  may 
well  identify  dept  uniquely.  We  shall  assume  that  even  in 
such  cases,  an  explicit  identifier  is  assigned.  Arguments  in 
favor  of  an  explicit  identifier  are  many  and  to  us  per¬ 
suasive:  (a)  Values  of  properties  and  associations  can 

change,  and  such  changes  can  propagate  if  they  are  used  in 
identifiers,  thus  violating  the  minimal  update  design 
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objective,  (b)  An  identifier  is  an  incarnation  of  an  entity 
and  as  such  should  only  be  inserted  and  deleted,  not 
changed,  (c)  Finally,  it  usually  takes  more  than  one  associ¬ 
ation  and/or  property  to  uniquely  identify  an  entity  set. 
For  global  representation,  such  a  combination  is  too  ver¬ 
bose  . 

(3)  The  concept  of  an  "identifier”  is  not  the  same  as  that 
of  a  "key”  in  the  framework  of  "normalization."  A  key  is 
defined  in  terms  of  the  attributes  of  a  specific  relation, 
while  we  have  defined  an  identifier  in  terms  of  an  entity 
set  and  nothing  else.  An  identifier  is  more  than  a  one-to- 
one  function,  its  role  is  to  stand  for  the  corresponding 
entity . 

(4)  Mapping  rule  (4.2)  is  clearly  designed  to  minimize 
updates.  By  choosing  to  represent  an  association  as  a  rela¬ 
tionship,  a  designer  can  circumvent  the  automatic  deletion 
property  of  an  association,  but  at  the  price  of  sacrificing 
some  economy  in  expression  for  updates. 

(5)  We  believe  that  mapping  rule  (4.3)  by  itself  captures 
the  essence  of  normal  forms.  A  violation  of  any  one  of  the 
normal  forms  can  be  interpreted  as  a  violation  of  rule 
(4.3)  . 

Example  4.1^ 

Consider  the  design  schema  given  in  example  2.1.  Let  us 
introduce  the  following  identifiers  for  the  entity  sets:  eno 
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for  etnp,  dno  for  dept,  and  ji^d  for  job.  The  primitive 
objects  for  this  example  together  with  their  functions  are 
given  as  follows: 

primitive  objects 
emp “  *  ■  ■ 
dept 
job 
mgr 

allocation 
qualified 

These  are  mapped  into  five  relations  according  to  the  map¬ 
ping  rules  as  follows: 

EMP ( eno , ename , bir thyr .assignment , works-in) 

DEPT(dno, dname .location ,mgr ) 

JOB(jid, title, salary) 

ALLOC (dno, jid , number ) 

0UAL( jid , eno) 

5.  Normal  Forms 

A  violation  of  one  of  the  normal  forms  can  always  be 
Interpreted  as  a  violation  of  mapping  rule  (4.3).  Different 
ways  in  which  (4.3)  are  violated  correspond  to  different 
normal  forms,  and  these  can  be  classified  as  follows: 

A.  Putting  two  primitive  objects  which  have  no  entity  set 
in  common  in  the  same  relation.  This  violation  of  (4.3) 
results  in  a  relation  not  in  ?NF. 

Example:  Cartesian  product  of  JOB  and  DEPT 

B.  Putting  a  function  of  an  entity  set  and  a  relationship 
involving  it  in  the  same  relation.  This  too  results  in 
a  relation  not  in  2NF. 
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Example:  Equijoin  of  JOB  and  QUAL  on  jid 

C.  Putting  two  functions  with  different  entity  sets  as 
their  domains  in  the  same  relation.  If  this  violatioi 
is  not  one  of  category  (A)  then  it  must  involve  func¬ 
tion  f^  and  f^  of  the  form 

f  f 

1 1  1 2 

E^  -->  E2  -'>  S 

This  results  in  a  relation  not  in  3NF- 

Example:  The  equijoin  EMP(assignment= jid) JOB 

D.  Putting  two  relationships  with  a  common  entity  set 
together  in  the  same  relation.  This  violation  results 
in  a  relation  not  in  4NF. 

Consider  the  equijoin  AQ  of  ALLOC  with  QUAL.  As  it 
stands,  AQ  is  not  in  2NF  because  of  the  partial  depen¬ 
dence  of  "number"  on  the  key  (dno, jid , eno)  of  AQ.  The 
projection  AQCdno , jid , eno]  is  in  3NF  (and  hence  also 
2NF) ,  but  not  4NF.  There  are  two  multivalued  dependen¬ 
cies:  "eno  on  jid"  and  "dno  on  jid  '  in  AQfdno , j id , eno ] . 

Theorem  5-1  A  relational  schema  resulting  from  applying  the 
mapping  rules  (4.1)  -  (4.3)  to  a  design  schema  is  in  4MF. 

proof:  By  rule  (4.3),  there  is  one  and  only  one  primitive 
object  per  relation.  One  possibility  is  that  the  primitive 
object  is  an  entity  set  E  serving  as  the  domain  of  a  collec¬ 
tion  of  primary  functions.  In  this  case  its  identifier  is  a 
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key  of  the  relation,  and  every  attribute,  being  a  represen¬ 
tation  of  a  primary  function  of  E,  is  a  full  function  of  the 
identifier.  There  can  be  no  multivalued  dependency  in  such  a 
relation . 

The  other  possibility  is  that  the  primitive  object  is  a 
relationship.  The  identifiers  of  the  entity  sets  making  up 
the  relationship  comprise  a  key  of  the  relation,  and  every 
non-key  attribute  is  a  function  of  the  key.  Suppose  that 
contrary  to  the  assertion  of  the  theorem  the  relation  is  not 
in  4NF,  then  it  is  equal  to  the  join  of  two  relations 
[FAGI77a].  Either  the  identifiers  making  up  the  key  are 
split  between  these  two  relations  or  they  are  not.  If  they 
are  split  then  the  relationship  must  be  decomposible ,  con¬ 
tradicting  the  assumption  that  each  relationship  in  the 
design  schema  is  indecomposable.  If  the  key  resides 
entirely  in  one  of  the  component  relations,  then  attributes 
of  the  other  component  relation  cannot  be  functions  of  the 
key,  contradicting  (4.2).  f)ED 

6.  Mappinjg  Rules  for  a  I^TG  Schema 

The  following  mapping  rules  are  introduced  to  convert  a 
design  schema  into  a  DBTG  schema  so  as  to  achieve  our  design 
goals : 


(6.1)  Each  entity  set  has  an  explicit  identifier. 

(6.2)  For  each  entity  set  E  define  a  record  type  r(E).  The 
data  items  of  r(E)  are  made  up  of  the  identifier  of  E 


I 
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and  the  properties  of  E. 

(6.3)  For  an  association  or  single-valued  relationship  R(E^, 
E^)  where  E^  i  E^,  define  a  set  type  s(R)  with  r(Ep) 
as  the  owner  record  type  and  r(E^)  as  the  member 
record  type. 

(6.4)  For  an  association  or  single-valued  relationship 
R(E,E)  define  a  record  type  r(R)  having  no  data  item, 
and  a  pair  of  set  types  s^(R)  and  S2(R)  forming  a 
cycle  between  r(E)  and  r(R).  The  assignment  is  dep¬ 
icted  below: 


(6.5)  For  a  general  relationship  R(E^,E2.  •••  .  E^)  define  a 
confluent  hierarchy,  consisting  of  a  record  type  r(R) 
with  only  the  properties  of  R  as  its  data  items,  and  n 
set  types  s^(R),  3j,(R),  ...  ,  s^(R)  as  shown  below. 
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All  the  object  types  in  the  design  model  have  now  been 
mapped  into  object  types  in  the  DBTG  data  model.  Two  kinds 
of  record  types  have  resulted  from  the  mapping:  ones  which 
contain  an  identifier  data  item  and  those  which  do  not.  We 
shall  call  the  former  self-identified  record  types,  and  the 
latter  link  record  types.  Self-identified  record  types 
represent  entity  sets  while  link  record  types  represent 
relationships  and  possibly  associations. 

For  set  types  the  logical  concept  of  to^l  membership 
would  be  useful  in  our  context.  A  record  type  r  is  a  total 
member  of  a  set  type  s  if  every  occurrence  of  r  is  a  member 
of  an  occurrence  of  s.  A  member  that  is  not  total  is  said 
to  be  partial .  The  membership  of  a  link  record  type  in  any 
set  type  should  always  be  total.  The  membership  of  a  self- 
identified  record  type  should  be  total  in  any  set  which 
represents  an  association  but  not  otherwise.  The  concept  of 
"total"  membership  in  sets  does  not  exist  in  the  current 
version  of  the  DBTG  model  although  a  related  concept  appears 
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to  have  been  suggested  by  [NIJI75]. 

Natural  enforcement  for  "total”  membership  is  provided 
by  the  mandatory/autoinatic  option  for  de lete/ i nser t  in  DHTG, 
except  when  a  sequence  of  set  types  form  a  cycle.  In  that 
case  one  set  type  in  the  cycle  Is  required  to  be  "manual”  on 
insertion.  Under  our  mapping  rules  a  cycle  of  set  types  each 
having  a  total  member  can  arise  in  only  two  ways:  (i)  a 
two-set  cycle  representing  a  self-association,  (ii)  a  cycle 
of  associations.  In  the  first  case  we  shall  make  the  set 
type  Sp(R)  mandator y/manual .  In  the  latter  case  we  choose 
any  one  of  the  set  types  to  be  mandatory/manual. 

Summarizing  our  discussion  on  set  membership,  we  have 
the  final  mapping  rule  for  DBTG: 

(6.6)  The  membership  of  a  link  record  type  in  any  set  type 
is  total.  The  membership  of  a  self-identified  record 
type  in  any  set  that  represents  an  association  (or  is 
a  part  of  the  representation  of  a  self-association)  is 
total,  but  not  otherwise.  All  set  membership  that  are 
not  total  are  optional/manual.  All  total  memberships 
are  mandatory/automatic,  except  for  a  self-association 
or  a  cycle  of  associations.  For  the  exceptions  one  of 
the  set  types  in  the  cycle  must  have  a 

mandatory/manual  membership  option,  and  the  property 
of  being  "total"  must  then  be  supported  procedurally . 

Application  of  rules  (6.1)-(6.6)  to  example  (2.1) 
yields  the  following  DBTG  schema: 
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Except  for  the  partial  membership  of  "dept"  in  "mgr,"  all 
memberships  are  total. 


Schema  Translation  and  Equivalence 

In  section  6  we  introduced  a  distinction  between  self- 
identified  records  and  link  records,  and  between  total  and 
partial  memberships  in  a  set.  These  distinctions  can  be 
deduced  from  existing  constructs  of  the  DDL,  but  it  may  well 
be  desirable  to  make  them  explicit  in  the  DBTG  schema.  With 
this  bit  of  semantic  enhancement  in  the  DBTG  data  model,  the 
process  of  mapping  a  design  schema  into  a  DBTG  schema  using 
rules  (6.1)  -  (6.6)  becomes  reversible,  as  is  shown  in  the 

following  correspondences: 

self-identified  record  type  --> 
data  items  except  id  — > 
set  type  between  two  self-  --> 
identified  record  types 


link  record  type  (member  of  — > 
two  or  more  set  types) 
a  two-set  cycle  between  a  --> 
self-identified  record 
type  and  a  link 


entity  set 
properties 

association  or  si ngle-valued 
relationship,  according  to 
whether  membership  is  total 
or  partial 
relationship 

association  or  single-valued 
relationship  between  an  entity 
set  and  itself,  differentiated 
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by  set  tnembership 

It  follows  that  a  DDTG  schema  derived  from  a  design 
schema  by  following  our  mapping  rules  can  be  translated  into 
a  relational  schema  which  is  the  same  as  what  would  be 
obtained  by  directly  mapping  the  design  schema.  The  rules  of 
translation  are  rather  simple  and  given  below. 

For  a  self-identified  record  define  its  key  to  be  its 
identifier.  For  a  link  record  define  its  l^y  to  be  the  col¬ 
lection  of  the  keys  of  the  owners  of  all  sets  in  which  the 
link  record  is  a  member.  For  a  DBTG  schema  obtained  from 
using  our  mapping  rules  link  records  can  only  be  owned  by 
self-identified  records,  so  that  the  definition  of  key  is 
not  circular.  We  propose  the  following  rules  for  DBTG  to 
relational  schema  translation. 

(7.1)  For  each  self-identified  record  type  r  define  a  rela¬ 
tion  R(r).  Each  data  item  of  r  is  a  domain  of  R(r). 
The  key  of  the  owner  of  every  set  in  which  r  is  a 
total  member  is  also  a  domain  of  R(r). 

(7.2)  For  each  link  record  type  k  that  is  the  member  of  two 
or  more  set  types,  define  a  relation  R(k).  The  domains 
of  R(k)  consist  of  the  data  items  of  k  plus  the  keys 
of  the  owners  of  the  sets  in  which  k  is  a  member. 

(7.3)  For  each  set  type  s  which  has  a  partial  member,  define 
a  binary  relation  R(s).  The  domains  of  R(s)  are  the 
keys  of  the  owner  and  the  member  of  s. 
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In  terms  of  the  constructs  of  the  design  model,  (7.1) 
identifies  an  entity  set ,  (7.2)  a  relationship,  and  (7.3)  ^ 
single-valued  relationship. 

We  have  identified  a  collection  of  DBTG  schemas  that 
can  be  mapped  into  corresponding  relational  schemas  while 
preserving  the  design  goals.  The  translation  procedure  can 
be  applied  in  general,  but  in  doing  so,  we  impose  a  semantic 
interpretation  on  the  schema  to  be  translated.  The  success 
of  the  translation  depends  on  the  extent  to  which  the 
interpretation  is  correct. 

Discussion 

The  focus  of  the  design  methodology  that  we  have 
described  is  on  the  preservation  of  basic  semantic  integrity 
so  as  to  prevent  unwanted  side  effects  in  redesign  and 
updates.  Issues  of  performance  have  not  been  addressed. 
For  the  relational  case,  issues  of  performance  relate  only 
to  the  storage  schema  and  are  rightly  avoided  in  the  design 
of  the  conceptual  schema.  The  situation  is  somewhat  more 
complex  in  the  DBTG  case.  Constraints  on  set  implementation 
and  the  possibility  of  clustering  members  cause  the  logical 
design  to  have  an  effect  on  performance.  We  shall  try  to 
explain  this  issue  through  an  example. 

Consider  a  design  schema  containing  entity  sets:  sup- 
pliers  and  parts,  and  a  relationship  inventory  (supplies, 
parts).  Following  the  mapping  rules  of  section  6  results  in 
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a  DBTG  schema  depicted  as  follows: 


where  "inventory"  is  a  link  record  type.  It  can  be  argued 
with  some  cogency  that  if  the  only  accesses  are  through  sup¬ 
pliers,  then  the  following  design  would  be  better  for  per¬ 
formance: 


The  latter  design  can,  in  fact,  be  obtained  with  our 
procedure  by  modifying  the  design  schema.  Instead  of 
"parts",  introduce  an  entity  set  PacJ^s-of-sup(^l_ier s ,  the 
same  part  supplied  by  different  suppliers  being  separate 
entities.  The  "inventory"  relationship  is  now  single-valued 
in  "par ts-of-suppliers" ,  so  that  an  application  of  mapping 
rule  (6.3)  results  in  precisely  the  second  design.  The 
design  goals  are  still  satisfied  but  now  have  a  different 
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interpretation.  For  example,  changing  the  property  of  a 
part  is  no  longer  an  atomic  update  operation,  so  that  the 
minimality  goal,  though  attained,  no  longer  guarantees  that 
only  a  single  data-item  value  is  changed.  In  short,  one  can 
change  the  DBTG  design  by  manipulating  the  design  schema, 
but  there  are  consequences  of  doing  so  on  redesign  and 
updates  and  these  can  be  made  explicit  by  a  careful  examina¬ 
tion  of  the  design  goals. 

The  semantics  of  our  design  model  are  still  quite 
sparse.  A  number  of  possible  additional  object  types  have 
been  deliberately  omitted.  Our  criterion  for  inclusion  is, 
"does  including  it  make  a  difference  in  operational  terms?" 
Examples  such  as  "property  of  property”  and  "relationship  of 
relationships"  fail  on  this  ground.  The  answer  in  the  case 
of  "dependent  entities"  is  less  straightforward .  It  can  be 
argued  that  there  are  natural  examples,  such  as  "children  of 
employee",  of  entities  whose  existence  in  the  database 
depends  on  that  of  others.  Operationally ,  such  dependence 
is  already  provided  for  in  our  design  model  by  the  construct 
"association".  The  entity  set  "children  of  employee"  would 
have  an  association  "parent".  Deletion  of  the  parent  causes 
deletion  of  the  child  in  conformity  with  our  rules  on 
automatic  deletion.  Insertion  of  a  child  cannot  be  made 
until  the  parent  exists.  The  only  difference  that  the 
semantic  construct  "dependent  entities"  might  make  on  the 
design  relates  to  the  choice  of  an  Identifier.  It  might  be 
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argued  that  the  identifier  of  a  dependent  entity  should 
always  be  a  concatenation  of  the  parent-id  and  a  local-id, 
and  there  is  no  provision  in  our  rules  for  enforcing  such  a 
choice.  However,  it  seems  to  us  that  the  additional  benefit 
is  small  and  does  not  warrant  adding  yet  another  semantic 
object . 


-24- 


Design  Goals 


Wong  4  Katz 


*  References 

[BERN76]  Bernstein,  P.  A.  "Synthesizing  third  normal  form 
I  relations  from  functional  dependencies."  Transac¬ 
tions  on  Database  Systems  1,4  (Dec.  1976),  pp .  277- 

298. 

[CHEN76]  Chen,  P.  P.  "The  entity-relationship  model 

towards  a  unified  view  of  data."  Transactioiis  on 
Database  Systems  1,1  (Mar.  1976),  pp .  9-36. 

[C0DD71]  Codd,  E.  F.  "Further  normalization  of  the  data  base 
relational  model."  Courant  Computer  Science  Symposia 
6,  Da^  Base  Systems ,  Prentice-Hall,  New  York,  (May 
1971T,  pp*  .“55-98. 

[FAGI77a]  Fagin,  R.  "Multivalued  dependencies  and  a  new  nor¬ 
mal  form  for  relational  databases."  Transactions  on 
Database  Systems  2,3  (Sep.  1977),  PP.  262-278. 

[FAGI77b]  Fagin,  R.  "The  decomposition  versus  the  synthetic 
approach  to  relational  database  design."  Proceedings 
1977  Very  Large  Data  Bases  Conference,  1977,  PP. 
441-446 . 

[HALL76]  Hall,  P.,  Owlatt,  J.,  and  Todd,  S. ,  "Relations  and 
entities".  Proceedings  IFIPTC-2  Working  Conference 
on  Modelling  in  Database  Management  Systems,  North 
Holland,  1976. 

[NIJI75]  Nijssen,  G.  M.  "Set  and  CODASYL  set  or  coset."  ^ta 
Base  Description ,  B.  C.  M.  Douque,  G.  M.  Nijsseni" 
eds..  Nor th-HoTland ,  Amsterdam,  1975,  pp.  1-70. 

[SCHM75]  Schmid,  H.  A.,  Swenson,  J.  R.,  "On  the  semantics  of 
the  relational  data  model."  Proceedings  ACM-Sigmod 
Conference,  (May  1976),  pp .  9-36. 


-25- 


